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Abstract. We review our recently developed methods for large-scale electronic 
structure calculations, both in one-electron theory and many-electron theory. 
The method are based on the density matrix representation, together with the 
Wannier state representation and the Krylov subspace method, in one-electron 
theory of a-few-tens nm scale systems. The hybrid method of quantum mechanical 
molecular dynamical simulation is explained. The Krylov subspace method, the 
CG (conjugate gradient) method and the shifted-COCG (conjugate orthogonal 
conjugate gradient) method, can be applied to the investigation of the ground 
state and the excitation spectra in many-electron theory. The mathematical 
foundation of the Krylov subspace method for large-scale matrix computation 
is focused and the key technique of the shifted-COCG method, e.g. the collinear 
residual and seed switching, is explained. A wide variety of applications of these 
extended novel algorithm is also explained. These are the fracture formation and 
propagation, liquid carbon and formation process of gold nanowires, together with 
the application to the extend Hubbard model. 
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1. Introduction 

Large-scale matrix computation is crucial in the electronic structure theory, both 
in one-electron theory for large-scale systems and in many-body theory for strongly 
interacting electron systems. Interplay of the electronic structure and nano-scale 
atomic structure plays an essential role in physical properties of nanostructure 
materials and the Order-N algorithm has been extensively investigated. The size 
of the Hilbert space grows exponentially with linear increase of the system size in 
many-electron problems. 

Very important ten algorithms were invented in the 20th century. [TJ [5] These 
algorithms are the Krylov subspace method, the QR algorithm, the Householder 
algorithm, the Fast Fourier Transformation (FFT) etc, where the most of them are of 
the matrix algebra and the order-N algorithm. The FFT algorithm is one of the basis 
of the local density approximation (LDA) in the density functional theory (DFT) and 
the Lanczos method, one of the Krylov subspace method, is that of the many-body 
electron theory. The efficiency of the modern Krylov subspace method seems not to be 
widely known in the field of electron theory, both in LDA and many-electron theory. 

In one-electron theory or DFT, the primarily important states are the states 
near the Fermi energy or the band gap. Then the standard mathematical tool is 
the diagonalization of the Hamiltonian matrix. This may be a serious difficulty in 
large scale systems. In many electron theory, the difficulty is the huge size of the 
Hamiltonian matrix and the resultant memory size and computational time. These 
are just the targets of the field of the large-scale matrix computation mentioned above. 

In this paper, we report our recent activity in (1) developing the 
quantum mechanical molecular dynamical (MD) simulation method with the exact 
diagonalization, the Wannier states representation and the Krylov subspace method 
in nano-scale systems up to a few 10 nm size and (2) the investigation of many- 
electron problem, i.e. the degenerated orbital extended Hubbard Hamiltonian of the 
size of 6.4 x 10 7 , with the Krylov subspace method. We explain the key aspects in 
one-electron theory in a large-scale system and the many-electron theory in Sect. II. 
Section III is devoted to the explanation of the Krylov subspace methods. Several 
applications are reviewed in Sect. IV and the summary is given in the last section. 

2. One-electron theory vs. many-electron theory 

2.1. One-electron spectrum in large-scale systems 

2.1.1. Density matrix formulation 

The LDA calculation is based on the variational principles and usually on eigen- 
hmction representation of the ground state. However, the eigen-functions are not 
always necessary in actual calculation nor useful in numerical investigation of large- 
scale systems. Instead, one can construct the formulation with the one-body density 
matrix. [3] Any physical property can be represented by the density matrix p as 



where X is an operator of the physical property X and i and j denote atomic sites 
and orbitals. Energy and forces acting on an individual atom can be calculated by 
replacing X by the Hamiltonian or its derivative. Therefore, one needs only (i,j) 




(1) 
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elements of the density matrix p corresponding to non-zero but not all elements. 
The density matrix is given as 

(occ) 

p=Y J \^){K\, (2) 

where \<f> a ) is the eigenstates or the Wannier states and the summation is restricted 
within the occupied states. It can be also written as 

where Gij is the Green's function defined as 

G l3 {e) = [{e + i5 (4) 

Here, p, k^, T and / are the chemical potential, the Boltzmann constant, the 
temperature and the Fermi-Dirac distribution function, respectively. 

We have developed a set of computational methods for electronic structure 
calculations, i.e. the generalized Wannier-state method, [UH31S] the Krylov subspace 
method (the subspace diagonalization method [Jj and the shifted COCG method [8]) 
and the generalized Wannier-state solver with parallelism. [9] These methods are ones 
for calculating the one-body density matrix and/or the Green's function for a given 
Hamiltonian. Calculation was carried out using the tight-binding formalism of the 
Hamiltonian. These methods can be used in a hybrid way as is explained in !2.1.5l [10 



2.1.2. Wannier state representation 

The Order-N algorithm can be constructed in semiconductors and insulators on 
the basis of the Wannier state representation. The generalized Wannier states are 
localized wavefunctions in condensed matters obtained by the unitary transformation 
of occupied eigenstates, [TTJ [T2l H] an d also obtained by an iterative way, starting a 
trial localize wavefunctions, with a mapped eigen-value equation [3] 

^ WS) > = 4^ WS) >, (5) 

where 

flfs = H + 2»bft ~ Hp, - - 9l H (6) 

OCC. 

and the energy parameter rj s should be much larger than the highest occupied level. 
Once one obtains the Wannier states, the density matrix can be easily constructed 
by Eq. ([2]) and the force acting on each atom can be calculated. We observed that 
the bond forming and breaking processes are well described in the localized Wannier 
states as changes between a bonding and non-bonding orbital. The Wannier states 
depend upon the local environment and the above iterative procedure is suitable to 
the MD simulation. 



2.1.3. Krylov subspace method 

In metallic systems, the Krylov subspace method is very useful to achieve the 
computational efficiency (accuracy and speed). [7J [5] The Green's function can be 
calculated in the Krylov subspace and one calculates the density matrix by Eq. @. 
Details are explained in Sec. [3] The Krylov subspace method is, of course, applicable 
to semiconductors and insulators, too. 
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2.1. 4-. Comparison among solver methods and Order-N character 

Figure Q] demonstrates our methods for 10 2 -10 7 atoms with and without parallel 
computation, [5J O UOj where the computational time is shown for the standard 
eigen-state solver (EIG) and our developed solver methods, Wannier-state solver with 
variational procedure (WS-VR), Wannier-state solver with perturbation procedure 
(WS-PT) and Krylov subspace solver with subspace diagonalization (KR-SD). Parallel 
computations are achieved by the Open MP technique (http://www.openmp.org). 
The Hamiltonian forms used here are the Slater-Koster-form ones of silicon 13 and 
carbon (T5], the linear- muffin-tin orbital (LMTO) theory [T5] in a form of the first- 
order for copper. Among the data in Fig.[T]except one by the eigen-state solver, 
the computational cost is 'order- N' or linearly proportional to the system size (TV), up 
to ten-million atoms and shows a satisfactory performance in parallel computation. 
The computational performance of the Wannier-state methods can be faster, at best 
by several hundred times, than that of the Krylov subspace method (See Fig. [U for 
example), particularly, if a dominant part of wavefunctions are well localized. Now 
the program package ('ELSES' = Extra Large-Scale Electronic Structure calculation) 
is being prepared. [16] 
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Figure 1. The computational time as a function of the number of atoms (N). 
[5l l6lll0l The time was measured for metallic (fee Cu and liquid C) and insulating 
(bulk Si) systems with up to 11,315,021 atoms, by the conventional eigenstatc 
calculation (EIG) and by our methods for large systems; KR-SD, WS-VR and 
WS-PT methods. See the original papers 5 6 10 for the details of parallel 
computation. 



2.1.5. Multiple solver method 

Since our method based on the density matrix formulation, we can construct 
another very important method "the multiple solver method" . The basic idea is the 
division of the Hilbert space; 

p = p A + Pb, PaPb = 0, (8) 

and the calculation can be done independently on different parts A and B. The 
importance is the fact that this hybrid method is completely within the quantum 
mechanical framework. Then we can use this hybrid scheme as the multiple solver 
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method in nano-scale systems and the calculated results does not cause any artificial 
discontinuity of physical quantities which might always occur in the hybrid scheme 
of naive simple division of the physical space. The choice or hybrid of the solvers 
is important for an optimal calculation with a proper balance between accuracy and 
computational cost. [5l [TO] 

2.2. Ground state of many- electron theory and excitation spectra 

In many-electron theory, we usually treat large matrices and the calculation becomes 
more and more difficulty, since the occupation freedom of one site grows exponentially 
and the matrix size is extraordinarily large. [17] In order to get the precise eigen-energy 
and eigen-vector of the ground state, one should use the Lanczos method and the CG 
method (the inverse iteration method) simultaneously. The Lanczos method is useful 
to get the approximate eigen-energy and eigen vectors of the ground state. However, 
the orthogonality of the generated basis vectors is broken at low iteration steps and 
the precision of the ground state energy and wavefunction could not be preserved. 
Then we use the CG method (the inverse iteration method) to improve them. After 
estimating the accuracy of the calculation (the norm of residual vector), we should 
repeat this procedure till the enough accuracy is obtained. 

After we obtain the wavefunction of the ground state, we should analyze the 
properties of excitations in a wide range of energy. For this purpose, the shift 
property of the COCG method, i,e, the sifted COCG method, can be a powerful 
tool. The advantages of the shifted COCG method is an efficient algorithm of solving 
shifted linear equations, the error-monitoring ability during the iterative calculation 
and the robustness. Then, it is very suitable for the problems in the many-electron 
problem. [TJ] The detailed explanation is given in Sec(31 

3. Krylov subspace method 

3.1. Krylov subspace 

We consider the simultaneous linear equations 

[{e + i5)\-H]\x j ) = \j), (9) 

for a given vector \j), real numbers e and 5. 1 is the unit matrix. When H is a huge 
N x N matrix, the inverse of H or [(e + iS)l — H] is not easily obtained or impossible 
to obtain and the iterative method becomes a useful concept. One can obtain an 
approximate eigen vector \xj) in a subspace spanned by vectors {H n \j)}\ 

K v (H,\j))=s Pl m{\j), H\j), H 2 \j), ••• , H"- 1 ^)}. (10) 

This subspace K^^H, \ is called the Krylov subspace. The basic theorem of the 
Krylov subspace is the invariance of the subspace under a scalar shift crl; 

K v (H,\j))=K v (al + H,\j)). (11) 

Lanczos found a new powerful way to generate an orthogonal basis for such 
subspace when the matrix is symmetric. 19 Hestenes and Stiefel proposed an elegant 
method, known as the conjugate gradient (CG) method, for systems that are both 
symmetric and positive definite. [20) 
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3.2. Sub space- diagonalization 

The first method is to find eigen- vectors {|u4"^)} approximated in K U (H, Ij'}) by 
diagonalizing the reduced Hamiltonian matrix 

H K "( 6 >M) = {(kW\H\K$)}, (12) 

where {\Km^)\m = 1, • • • ,is} is the orthogonalized basis set of the Krylov subspace 
K v { y II 1 \ j)), which satisfies the three-term recurrence relation and constructed by the 
Lanczos process or the Gram-Schmidt process. In this subspace we can calculate the 
density matrix very easily. [7] 

The subspace diagonalization method may be accurate enough for the several 
purposes in one-electron spectra in large-scale systems and calculation of total and 
local density of states. However, the orthogonality would be broken, when we use a 
larger number of the subspace dimension, for the basis vectors satisfying the three-term 
recurrence relation, pj [5] Therefore, the numerical accuracy may be limited when one 
need finer structure of spectra and we should extend the methodology to the shifted 
COCG method. 

In many-electron theory, the Lanczos method is widely used for obtaining 
the eigen-energy and many-electron wavefunction of the ground state in the exact 
diagonalization method. The accuracy can be greatly improved when we use the CG 
method. 

3.3. Shifted-COCG method and seed- switching technique 

When the matrix (eol — H) is real symmetric, then one can use the CG method for 
an iterative solution of the simultaneous linear equation (eol — ff)x = b. One should 
introduce the infinitesimal small (but finite) imaginary number iS for the Green's 
function and the matrix (eq + iS)l — H is complex symmetric. Then we can use the 
conjugate orthogonal conjugate gradient (COCG) method for solving the equation 
{(e +iS)l-iT}x = b. ® 

Since the energy parameters e are arbitrary given or continuously changing in a 
wide energy range, one should solve also the shifted linear equations 

[(e +o- + i6)l-H] \x<f ) ) = \j), (13) 

with a fixed energy (seed) £o- The energy shift parameter a can be even complex. 
The shifted-COCG method was constructed, [5J [TH] in which the theorem of collinear 
residual [H] for the shifted linear systems is applied to the COCG method. The 
essential property is based on the basic invariance theorem of the Krylov subspace 
Eq. under an energy shift Eq + a from Eq. Therefore, the Krylov subspace 

for the equation [(e + cr + i5)l — H] \xj) = \j) can be generated from that of 

[(eq + i5)l — H] \xj) = \j) of a selected seed energy e . The very important fact is 
that this shift procedure is scalar linear calculation. Essential cost for solving Eq. ([9]) 
should be paid only for the seed energy Eq and the rest is a scalar linear calculation 
which is negligible. 

The choice of the seed energy is not unique and sometime the calculations cannot 
be finished under a required criteria. Then one should continuously change the energy 
parameter and choose a new seed energy e + in again. Essentially important point 
is that we can continue the calculation with a new seed energy, keeping calculated 
information of the former seed energy. This is another very important property called 
'seed-switching'. [22l[T8] 
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3.4- Accuracy control with residual vector and robustness of shifted COCG method 

It is essentially important to know the accuracy of the solution during the iteration 
procedure and we can monitor the convergence behavior of the iterative solutions of 
the Krylov subspace method. 

The residual vector can be defined both in the subspace diagonalization and 
shifted COCG method 0] as 

\rf ) ) = {e + \5-H)\xf ) )-\j), (14) 

where \Xj } is the z/-th iterative solution. This residual vector can be monitored during 
the iterative calculation and we can stop the iterative procedure, without fixing the 
dimension of the Krylov subspace, once one can obtain the required accuracy. The 
norm of the residual vector can give the upper limit of the accuracy of the Green's 
function itself. [18] 

The shifted COCG method is numerically robust and one can reduce the norm 
of the residual vector to the machine accuracy. Therefore, the shifted-COCG method 
may be used to calculate accurate or fine density of electronic states in one-electron 
spectrum in large-scale systems or the fine excitation spectra in many-electron 
problems. 

4. Applications 

4--1- Application to nano-scale systems 

4-1.1. Formation and propagation of fracture in silicon crystal 

In this subsection, we present an application study of our simulation; fracture 
formation and propagation phenomena in Si nano-scale crystal, [3 [6] where the 
Hamiltonian is given as a tight-binding representation. 43] The calculation was carried 
out by the Wannier-state methods with up to 10 5 atoms. 




Figure 2. Silicon cleavage dynamics. [B] A 14-nm-scale simulation result shows 
the bending of cleavage path from an unstable (OOl)-likc plane into experimentally 
observed (lll)-like and (llO)-like planes. The right panel shows a step-formation 
process on Si(lll)-2xl surface. 

In the dynamical fracture formation process on the (001) plane, two bonds are 
broken and an asymmetric dimer (2x1 periodicity on the resultant (001) surface) 
is formed after thermal motions of a time about 0.4 ps. First, bonds are broken 
successively in an atom array of the dimer bonds on the plane along one of [110] 
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directions. Along the formed asymmetric dimer bonds, the inter-atomic distance is 
shortened due to the formed bonding bonds. The distortion energy is accumulated 
and, then, other bonds along a parallel atom array, but not the same, are broken. 
This fracture propagation (perpendicular to the direction of the formed asymmetric 
dimer bonds) is governed by the accumulated distortion energy. Our calculation can 
represent this surface breaking mechanism on the (001) plane of Si crystals. 5J 

We also studied with 14 nm scale Si crystals the easy-propagating plane of 
fracture. [6] It is widely known that the easy-propagating plane of fracture in Si is 
(110) or (111) planes. In case of fracture on the (111) plane, the (lll)-(2 x 1) surface 
reconstruction appears (the Pandy structure [23]) and several steps are formed. The 
fracture propagation plane is not explained by the energy of established stable surfaces 
but by that of ideal or transient surface structure without reconstruction. In a MD 
process in a larger systems with 14nm length, even if a fracture propagation starts 
on a (001) plane, the plane of the fracture propagation changes to (111) and (110) 
planes. Figure shows examples of the simulation results. 
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Figure 3. Molecular dynamical simulation of liquid carbon by the MD simulation 
with the Krylov subspace method, (a) The pair correlation function and (b) the 
electron density of states. 



4-1-2. Liquid carbon 

Liquid carbon of 13,824 atoms was simulated with the Krylov subspace 
method. [10] The density and the temperature are set to be p = 2.0 g cm 3 and 
T = 6000 K. The time interval of a MD step is At — 1 fs and the subspace dimension 
and the number of interacting atoms are chosen to be v — 30 and Apr = 200, 
respectively. Figure Ela) shows the resultant pair correlation (PC) function with 
comparison of the conventional eigenstate method of 216 atoms and we should notice 
that the two graphs are identical. Figure [3jb) shows the electronic density of states 
(DOS) of a system of with 13,824 atoms from the Greenfs function, by the Krylov 
subspace method. The DOS calculation was achieved with the controlling parameters 
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of a heavier computational cost (y = 300 and -ZVpr = 1000) and r\ = 0.05 eV. Since 
the present Hamiltonian includes only s and p orbitals, the resultant DOS misses a 
structure in higher energy regions. The resultant DOS shows the characteristic profile 
of liquid carbon, e.g. a narrow it band appears between —5 and +5 eV as in carbon 
nanotubes. The ir bond in the liquid phase is imperfect and non-bonding (atomic) p 
states appear as a sharp peak near the chemical potential (e ~ 0.6 eV). 

4-.1.3. Helical multishell structure of gold nanowire 

Another application is the formation process of helical multishell gold 
nanowires. [23] Gold nanowires obtained by TEM thinning process have helical 
multishell structures along the original [110] axis with helicity, and the outermost 
shell is a (lll)-like atomic sheet. [25] The difference of numbers of atoms between the 
outermost and the next outermost shells is seven, called 'magic number', except cases 
of five and seven atoms on the outermost shell. 

We proposed the two-stage formation model of Au nanowires where the driving 
force for the helicity is the atom row slip. At the first stage, the outermost shell is 
dissociated from the inner shell to rotate freely. At the second stage, an atom row on 
the outermost shell slips and the (001) faces on the rod surface transform into (111) 
surfaces. 



(a) (b) 




Figure 4. The formation process of the helical multishell structure in gold 
nanowires. Non-helical structure (left) is transformed into helical one (right). 
(a)Formation process of (11-4) helical multishell gold nanowire. |24| The symbol 
(11-4) means that the number of atoms is eleven and four in the outer and inner 
shell, respectively. Atom row slip along the wire axis introduces the helicity as 
shown by three lines, (b) Transformation of structure of longer gold nanowire of 
1,020 Au atoms and the resultant (15-8-1) nanowire contains several defects. 

We verified the above two-stage model by using the MD simulations with a tight- 
binding Hamiltonian, [26] starting from an 'ideal' nanowire of stacking (110) sections 
of the fee lattice. The calculation was carried out by the eigen-state solver with about 
80-1020 atoms. Here, we show results of 143 atoms in Fig. Ufa) [23] and those in 
larger systems of 1020 atoms in (b). The total energy decreases almost monotonically 
after 1,000 MD steps (1MD step = lfs). First, the surface atoms dissociate from inner 
shell and, then, can move rather freely. From 2,000 to 5,000 MD steps, (001) sheet 
reconstructs into hexagonal (lll)-like surface with an atom- row slip deformation, and 
the helical structure on the surface appears. The inner shell rotates at the same time 
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of the atom-row slip. Analysis of electronic structure shows that the mechanism in 
both stages is governed by the d-band electrons extending over the (lll)-like surface, 
where the center of gravity of the d-band locates in the lower energy side. The helical 
nanowires appear only among metals with a wider d-band, e.g. in Au and Pt but 
not in Ag and Cu. Helicity is introduced by the surface reconstruction or the atom- 
row slip on the (001) sheet, because the triangular (lll)-like sheet is more preferable 
for d-orbitals extending over the surface. The d-band width in platinum and gold is 
commonly wider than that in lighter elements, Ag and Cu, and the calculated result 
explains why platinum nanowire can be also formed with helicity. 



4-2. Application to many-electron systems : Excitation spectrum of multi- orbital 
extended Hubbard Hamiltonian on two-dimensional square lattice 

The transition metal oxides have been paid a great attention due to their various 
physical properties which are drastically changed and controllable by external fields 
or doping. Here we show an application of the shifted COCG method to the 
extended Hubbard model with doubly degenerated orbital and the inter-site Coulomb 
interaction on a two-dimensional square lattice. [17j This is a model of LaiSriNiC^ 

and we used a finite unit of N = 8 sites and the total number of electrons N c = | N = 
12. 
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Figure 5. The excitation spectra, electron ionization and affinity, of doubly 
degenerated extended Hubbard model on \/8 X V8 two-dimensional square 
lattice. |17] The matrix size of the reduced subspace of the total spin Sz = is 
64, 128, 064. The excitation gap is formed with increasing the inter-site Coulomb 
interaction V. 

We focused our attention to the Hilbert space of the total spin S z = and the 
matrix size is (i 6 C 6 ) 2 = 64, 128, 064. The difficulty in many-electron systems is (1) the 
large dimension of the Hamiltonian matrix which grows exponentially with increasing 
number of atoms linearly and (2) very small energy intervals between adjacent 
eigenenergies which require a difficulty in separation of respective eigenvectors. This 
difficulty requires fast, reliable and stable calculation algorithm for large matrices. 
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Figure [5] shows the excitation spectra of electron ionization and affinity levels and 
the energy gap between these two corresponds to the excitation gap. Normally, the 
Hubbard model of non-degenerate orbital, in case of the integer occupation, gives the 
insulating gap due to the on-site Coulomb interaction and, on the contrary, in case 
of non-integer occupation, the system is metal. Here, in the doubly degenerate case, 
the charge stripe order with an insulator gap is formed due to the inter-site Coulomb 
interaction V and, on top of that, the spin stripe is formed with anisotropy of electron 
hoppings. [17] 

Crucial point is that we should keep very high accuracy of the computation for 
judging the 'gap', compared with the 'level interval' in finite systems and that the 
iteration convergence should be controlled during the iterative calculation. Therefore, 
the capability of convergence (accuracy) monitoring and robustness are seriously 
important and the shifted COCG method can solve this difficulty. 

5. Conclusions 

We have reviewed our recently developed methods for large-scale electronic structure 
calculation applied to both one-electron theory and many-electron theory. For large- 
scale systems of about 10's nm scale, one can use several solver methods simultaneously 
as a multi-solver method. We also explained differences between two theories from 
the viewpoint of large-scale matrix computation. Then we presented examples of 
the applications of nano-scale systems, the formation and propagation of fracture 
in large silicon crystals, the MD simulation in liquid carbon, and the formation 
of helical multishell structure of gold nanowires and an example of many-electron 
problems, the orbital degenerated extended Hubbard model. In these applications, 
we stressed the importance of the hybrid scheme of multiple solver methods and the 
novel computational algorithm. 
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